• Home
  • feature selection
  • OpenAccess
    • List of Articles feature selection

      • Open Access Article

        1 - Determination of Optimum SVMs Based on Genetic Algorithm in Classification of Hyper spectral Imagery
        farhad samadzadegan hadise hassani
        Hyper spectral remote sensing imagery, due to its rich source of spectral information provides an efficient tool for ground classifications in complex geographical areas with similar classes. Referring to robustness of Support Vector Machines (SVMs) in high dimensional More
        Hyper spectral remote sensing imagery, due to its rich source of spectral information provides an efficient tool for ground classifications in complex geographical areas with similar classes. Referring to robustness of Support Vector Machines (SVMs) in high dimensional space, they are efficient tool for classification of hyper spectral imagery. However, there are two optimization issues which strongly effect on the SVMs performance: Optimum SVMs parameters determination and optimum feature subset selection. Traditional optimization algorithms are appropriate in limited search space but they usually trap in local optimum in high dimensional space, therefore it is inevitable to apply meta-heuristic optimization algorithms such as Genetic Algorithm to obtain global optimum solution. This paper evaluates the potential of different proposed optimization scenarios in determining of SVMs parameters and feature subset selection based on Genetic Algorithm (GA). Obtained results on AVIRIS Hyper spectral imagery demonstrate superior performance of SVMs achieved by simultaneously optimization of SVMs parameters and input feature subset. In Gaussian and Polynomial kernels, the classification accuracy improves by about 5% and15% respectively and more than 90 redundant bands are eliminated. For comparison, the evaluation is also performed by applying it to Simulated Annealing (SA) that shows a better performance of Genetic Algorithm especially in complex search space where parameter determination and feature selection are solve simultaneously. Manuscript profile
      • Open Access Article

        2 - An Improved Method for Detecting Phishing Websites Using Data Mining on Web Pages
        mahdiye baharloo Alireza Yari
        Phishing plays a negative role in reducing the trust among the users in the business network based on the E-commerce framework. therefore, in this research, we tried to detect phishing websites using data mining. The detection of the outstanding features of phishing is More
        Phishing plays a negative role in reducing the trust among the users in the business network based on the E-commerce framework. therefore, in this research, we tried to detect phishing websites using data mining. The detection of the outstanding features of phishing is regarded as one of the important prerequisites in designing an accurate detection system. Therefore, in order to detect phishing features, a list of 30 features suggested by phishing websites was first prepared. Then, a two-stage feature reduction method based on feature selection and extraction were proposed to enhance the efficiency of phishing detection systems, which was able to reduce the number of features significantly. Finally, the performance of decision tree J48, random forest, naïve Bayes methods were evaluated{cke_protected_1}{cke_protected_2}{cke_protected_3}{cke_protected_4} on the reduced features. The results indicated that accuracy of the model created to determine the phishing websites by using the two-stage feature reduction based Wrapper and Principal Component Analysis (PCA) algorithm in the random forest method of 96.58%, which is a desirable outcome compared to other methods. Manuscript profile
      • Open Access Article

        3 - Feature selection for author identification of Persian online short texts
        somayeh arefi mohamad ehsan basiri omid roozmand
        The growing use of social media and online communication to express opinions, exchange ideas, and also the expanding use of of this platforms by Persian users has increased Persian texts on the Web. This remarkable growth, along with abusive use of the writer's anonymit More
        The growing use of social media and online communication to express opinions, exchange ideas, and also the expanding use of of this platforms by Persian users has increased Persian texts on the Web. This remarkable growth, along with abusive use of the writer's anonymity, reveals the need for the author's automatic identification system in this language. In this research, the purpose of the study is to investigate the factors affecting the identification of authors of Persian reviews produced by cell-phone buyers and also to evaluate supervised and unsupervised methods. The factors considered in this research include lexical, syntactic, semantic, structural, grammatical, text-specific, and specific to social networks. After extracting these features, selecting the best features is tested by four algorithms including feature correlation, gain ratio, OneR, and principal components analysis. In the following, K-means, EM and density-based clustering will be used for clustering and Bayesian network, random forest, and Bagging will be used for categorization. The evaluation of the above algorithms on Persian comments of Samsung phone buyers indicates that the best performance among the clustering algorithms is 59/16% obtained by the EM algorithm on top-15 features selected by OneR, while the random forest algorithm using top-90 features selected by gain ratio with 79/57% achieves the best performance among the classification algorithms. Also, the comparison of features showed that syntactic features had the most effect on the identification of the author of short texts, and then, lexical, text-specific, specific to social networks, structural, grammatical and semantic features, respectively. Manuscript profile
      • Open Access Article

        4 - Analysing students' learning through morning exercise using data mining techniques
        behzad lak narges abbasi
        Since school has identified as one of the major agents in the socialization process, it has found remarkable position in the educational system of any country. Improving student learning is also a key factor to enhance the educational system quality in schools. As regul More
        Since school has identified as one of the major agents in the socialization process, it has found remarkable position in the educational system of any country. Improving student learning is also a key factor to enhance the educational system quality in schools. As regular exercise has profoundly positive impact on learning, this paper mainly aims to provide an approach to enhance students' learning process through morning exercise based on artificial neural network (ANN) technique and intelligent water drop optimization algorithm. This study is a quantitative research, which is purposefully a descriptive-analytical and methodologically a practical study. To that end, ANN technique was used to classify and extract the results, as well as, intelligent water drop optimization algorithm was employed for feature selection. In ANN, eleven neurons were selected as the appropriate number of hidden layer neurons; a combination of two linear and sigmoidal activation functions were employed as interlayer transmission functions; a training function was applied to train the network; and a maximum 3000 duplicates was proposed for the training algorithm on dataset. The accuracy of the proposed method was 68%, which has improved by about 2.2% compared to the basic method, i.e., exercise has a positive effect on students' learning. The results showed a proper performance of the optimal classification on the dataset with homogeneous parameters as well as a better performance of the artificial neural networks than the novel methods. Accordingly, the proposed method can have an appropriate improvement in terms of output accuracy in strengthening the learning process. Manuscript profile
      • Open Access Article

        5 - Fake Websites Detection Improvement Using Multi-Layer Artificial Neural Network Classifier with Ant Lion Optimizer Algorithm
        Farhang Padidaran Moghaddam Mahshid Sadeghi B.
        In phishing attacks, a fake site is forged from the main site, which looks very similar to the original one. To direct users to these sites, Phishers or online thieves usually put fake links in emails and send them to their victims, and try to deceive users with social More
        In phishing attacks, a fake site is forged from the main site, which looks very similar to the original one. To direct users to these sites, Phishers or online thieves usually put fake links in emails and send them to their victims, and try to deceive users with social engineering methods and persuade them to click on fake links. Phishing attacks have significant financial losses, and most attacks focus on banks and financial gateways. Machine learning methods are an effective way to detect phishing attacks, but this is subject to selecting the optimal feature. Feature selection allows only important features to be considered as learning input and reduces the detection error of phishing attacks. In the proposed method, a multilayer artificial neural network classifier is used to reduce the detection error of phishing attacks, the feature selection phase is performed by the ant lion optimization (ALO) algorithm. Evaluations and experiments on the Rami dataset, which is related to phishing, show that the proposed method has an accuracy of about 98.53% and has less error than the multilayer artificial neural network. The proposed method is more accurate in detecting phishing attacks than BPNN, SVM, NB, C4.5, RF, and kNN learning methods with feature selection mechanism by PSO algorithm. Manuscript profile